AITopics | conjecture 3

Collaborating Authors

conjecture 3

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High-dimensional Asymptotics of Denoising Autoencoders

Neural Information Processing SystemsFeb-9-2026, 04:15:14 GMT

We address the problem of denoising data from a Gaussian mixture using a two-layer non-linear autoencoder with tied weights and a skip connection.

artificial intelligence, conjecture 3, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

2722a0ccf6acfe3d144fdbb0dedd80b5-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 07:43:55 GMT

characterization, conjecture 3, mse, (15 more...)

Neural Information Processing Systems

Country:

North America > United States (0.14)
Europe > Switzerland > Vaud > Lausanne (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Industry: Health & Medicine (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

DeMo: Decoupled Momentum Optimization

Peng, Bowen, Quesnelle, Jeffrey, Kingma, Diederik P.

arXiv.org Artificial IntelligenceNov-29-2024

Training large neural networks typically requires sharing gradients between accelerators through specialized high-speed interconnects. Drawing from the signal processing principles of frequency decomposition and energy compaction, we demonstrate that synchronizing full optimizer states and model parameters during training is unnecessary. By decoupling momentum updates and allowing controlled divergence in optimizer states across accelerators, we achieve improved convergence compared to state-of-the-art optimizers. We introduce {\textbf{De}}coupled {\textbf{Mo}}mentum (DeMo), a fused optimizer and data parallel algorithm that reduces inter-accelerator communication requirements by several orders of magnitude. This enables training of large neural networks even with limited network bandwidth and heterogeneous hardware. Our method is topology-agnostic and architecture-independent and supports scalable clock-synchronous distributed training with negligible compute and memory overhead. Empirical results show that models trained with DeMo match or exceed the performance of equivalent models trained with AdamW, while eliminating the need for high-speed interconnects when pre-training large scale foundation models. An open source reference PyTorch implementation is published on GitHub at https://github.com/bloc97/DeMo

artificial intelligence, machine learning, momentum, (17 more...)

arXiv.org Artificial Intelligence

2411.1987

Country:

North America > United States (0.04)
Europe > Italy > Tuscany > Florence (0.04)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.90)

Add feedback

KernelSHAP-IQ: Weighted Least-Square Optimization for Shapley Interactions

Fumagalli, Fabian, Muschalik, Maximilian, Kolpaczki, Patrick, Hüllermeier, Eyke, Hammer, Barbara

arXiv.org Artificial IntelligenceJul-16-2024

The Shapley value (SV) is a prevalent approach of allocating credit to machine learning (ML) entities to understand black box ML models. Enriching such interpretations with higher-order interactions is inevitable for complex systems, where the Shapley Interaction Index (SII) is a direct axiomatic extension of the SV. While it is well-known that the SV yields an optimal approximation of any game via a weighted least square (WLS) objective, an extension of this result to SII has been a long-standing open problem, which even led to the proposal of an alternative index. In this work, we characterize higher-order SII as a solution to a WLS problem, which constructs an optimal approximation via SII and $k$-Shapley values ($k$-SII). We prove this representation for the SV and pairwise SII and give empirically validated conjectures for higher orders. As a result, we propose KernelSHAP-IQ, a direct extension of KernelSHAP for SII, and demonstrate state-of-the-art performance for feature interactions.

kernelshap-iq, sii, square optimization, (13 more...)

arXiv.org Artificial Intelligence

2405.10852

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
Europe > Netherlands (0.04)
(2 more...)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.68)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

An Analysis of the Expressiveness of Deep Neural Network Architectures Based on Their Lipschitz Constants

Zhou, SiQi, Schoellig, Angela P.

arXiv.org Machine LearningDec-24-2019

Deep neural networks (DNNs) have emerged as a popular mathematical tool for function approximation due to their capability of modelling highly nonlinear functions. Their applications range from image classification and natural language processing to learning-based control. Despite their empirical successes, there is still a lack of theoretical understanding of the representative power of such deep architectures. In this work, we provide a theoretical analysis of the expressiveness of fully-connected, feedforward DNNs with 1-Lipschitz activation functions. In particular, we characterize the expressiveness of a DNN by its Lipchitz constant. By leveraging random matrix theory, we show that, given sufficiently large and randomly distributed weights, the expected upper and lower bounds of the Lipschitz constant of a DNN and hence their expressiveness increase exponentially with depth and polynomially with width, which gives rise to the benefit of the depth of DNN architectures for efficient function approximation. This observation is consistent with established results based on alternative expressiveness measures of DNNs. In contrast to most of the existing work, our analysis based on the Lipschitz properties of DNNs is applicable to a wider range of activation nonlinearities and potentially allows us to make sensible comparisons between the complexity of a DNN and the function to be approximated by the DNN. We consider this work to be a step towards understanding the expressive power of DNNs and towards designing appropriate deep architectures for practical applications such as system control.

dnn, expressiveness, lipschitz constant, (14 more...)

arXiv.org Machine Learning

1912.11511

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Beneath the valley of the noncommutative arithmetic-geometric mean inequality: conjectures, case-studies, and consequences

Recht, Benjamin, Re, Christopher

arXiv.org Machine LearningFeb-19-2012

Randomized algorithms that base iteration-level decisions on samples from some pool are ubiquitous in machine learning and optimization. Examples include stochastic gradient descent and randomized coordinate descent. This paper makes progress at theoretically evaluating the difference in performance between sampling with- and without-replacement in such algorithms. Focusing on least means squares optimization, we formulate a noncommutative arithmetic-geometric mean inequality that would prove that the expected convergence rate of without-replacement sampling is faster than that of with-replacement sampling. We demonstrate that this inequality holds for many classes of random matrices and for some pathological examples as well. We provide a deterministic worst-case bound on the gap between the discrepancy between the two sampling models, and explore some of the impediments to proving this inequality in full generality. We detail the consequences of this inequality for stochastic gradient descent and the randomized Kaczmarz algorithm for solving linear systems.

artificial intelligence, inequality, machine learning, (15 more...)

arXiv.org Machine Learning

1202.4184

Country: North America > United States > Wisconsin (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)

Add feedback